\chapter{Optimizations on intermediate representation}

\section{Reducing an instruction graph}
\label{mir-optimizations-reduction}

As a result of certain local optimizations, it is possible that a well
formed instruction graph nevertheless has instructions that are not
reachable from the initial instruction.  Removing such unreachable
instructions is done by \emph{reducing} it.

To reduce an instruction graph, first the set of all reachable
instructions is computed.  Then, for each reachable instruction $i$,
its predecessors are examined.  If $i$ has a predecessor $p$ that is
not a member of the set of reachable instructions, then $p$ is removed
as described below.  Notice that $i$ must have at least one reachable
predecessor, so $i$ must have at least two predecessors, one of which
is $p$.  We remove $p$ as follows:

\begin{itemize}
\item If $i$ is a \texttt{phi-instruction}, then determine the
  position $k$ of $p$ among the predecessors of $i$.  Remove the
  $k$:th input from $i$ and the following \texttt{phi-instruction}s
  belonging to the same \emph{cluster}.  If this operation results in
  the \texttt{phi-instruction}s in the cluster having a single input,
  then replace each \texttt{phi-instruction} in the cluster by an
  \texttt{assignment-instruction}.  Finally remove the $k$:th
  predecessor of $i$.
\item If $i$ is not a \texttt{phi-instruction}, simply remove $p$ from
  the list of predecessors of $i$.
\end{itemize}

\Defun {cleavir-ir:set-predecessors} {initial-instruction}

Implements reduction as described above.\fixme{Not sure it handles phi instructions correctly.}

\section{Path replication}
\label{hir-optimizations-path-replication}

Certain optimizations and other transformations consist of eliminating
redundant tests.  These optimizations are of great importance because
of the potentially high cost of such tests, especially if the test is
in a loop and it is loop invariant.  In contrast to other redundant
computations, eliminating redundant tests is not just a matter of
replacing a computation by a reference to a variable.  Instead, the
outcome of the test must be remembered by replicating some
instructions.

Redundant tests occur frequently, not because the programmer wrote
redundant code, but because they are introduced by different
\commonlisp{} operators, especially when the code for these operators
is inlined.  Consider the two very common \commonlisp{} functions
\texttt{car} and \texttt{cdr}.  These functions are the lowest-level
functions on \texttt{cons} cells.  The code for these functions must
first test whether the argument is a \texttt{cons} cell.  If the user
code contains both a call to \texttt{car} and a call to \texttt{cdr}
in the same execution path, then the second such test is redundant
since the outcome is the same.  Redundant tests occur in other,
similar, situations such as tests to determine whether a variable
contains a \texttt{fixnum} as part of arithmetic operation on that
variable.

The main problem we must solve in order to remove these redundant
tests is that the execution paths will merge after the first
occurrence of the test, so that the outcome of the test is not known
at the point of the second occurrence of the test.  Consider the
following code that might exist explicitly or implicitly as a result
of \emph{destructuring}:

\begin{verbatim}
(let ((x (car w))
      (y (cdr w)))
  ...)
\end{verbatim}

\refFig{fig-path-merging} shows the equivalent HIR code.

\begin{figure}
\begin{center}
\inputfig{fig-path-merging.pdf_t}
\end{center}
\caption{\label{fig-path-merging}
Path merging.}
\end{figure}

In \refFig{fig-path-merging}, \texttt{consp} is an instruction that
tests whether the input is a \texttt{cons} cell and branches
accordingly.  The instructions \texttt{car} and \texttt{cdr}
correspond to the \commonlisp{} operations with the same name, except
that they assume that the input has been determined to be a
\texttt{cons} cell.  The cloud-shaped computations correspond to tests
for \texttt{w} being \texttt{nil} and to a call to \texttt{error} if
those tests fail.

As can be seen in \refFig{fig-path-merging}, before the second
\texttt{consp} instruction, the execution paths merge, so that at that
point it is not known whether \texttt{w} contains a \texttt{cons} cell
or something else.  The situation is complicated by the fact that
there may be an arbitrary number of instructions between the point
where the paths merge and the second \texttt{consp} instruction.
There can even be arbitrary branching and loops among those
instructions.  The transformation describe in this section replicates
all those instructions so that the second \texttt{consp} instruction
can be eliminated.

For this transformation to be applicable, there must not be an
assignment to the variable being tested between the two tests.
Furthermore, the first test must dominate the second test.

The transformation is accomplished by local rewrite rules applying to
the second test, or to replicas of that test.

\section{Static single assignment form}
\label{mir-optimizations-ssa-form}

While not an optimization in itself, static single assignment (SSA)
form is a prerequisite for many optimization techniques, which is why
we describe it here.

In their conference paper from 2013
\cite{Braun:2013:SEC:2450247.2450258}, Braun et al present an
algorithm capable of a direct translation of abstract syntax trees to
SSA form.  We do not use it here, because we have not been able to
couple it with our compilation context which seems to require
compilation from the end to the beginning of the program.

Instead, we use the traditional technique based on \emph{iterated
  dominance frontiers}.

Muchnick \cite{Muchnick:1998:ACD:286076} discusses SSA form, but the
description is very sketchy.  He describes using iterated dominance
frontiers to find nodes where $\phi$ functions must be inserted, but
he does not discuss how to rename variables.  He also does not justify
why the start node of the control flow graph is included in the
argument to each computation using iterated dominance frontiers.  His
notation is essentially the same as that of Cytron et al
\cite{Cytron:1991:ECS:115372.115320}.  For these reasons, we base our
description on the paper by Cytron et al, rather than on Muchnick's
book.

In fact, there are two papers by Cytron et al that give algorithms for
inserting $\phi$ functions.  The first one was published in 1989 in
POPL \cite{Cytron:1989:EMC:75277.75280} and the second one in 1991 in
TOPLAS \cite{Cytron:1991:ECS:115372.115320}.

All research we have seen related to SSA form assumes that functions
can not be nested.  It is not even clear what it would mean to convert
the following code fragment to SSA:

\begin{verbatim}
(let ((x ...))
  (flet ((f ()
            (incf x)))
    (f)
    (f)))
\end{verbatim}

We can see no direct way to convert the body of the local function
\texttt{f} to SSA form.%
\footnote{If the function \texttt{f} is \emph{inlined}, it is not hard
  of course.}

Until further research proves otherwise, we must consider SSA to be a
per-variable property, so that some variables can not be considered
candidates for SSA transformation.

Having said that, we can apply some \emph{tricks} to make it work.  In
the program previous program fragment, we could alter the signature of
the local function \texttt{f} as follows:

\begin{verbatim}
 (let ((x1 ...))
   (flet ((f (x)
             (+ x 1)))
     (let ((x2 (f x1)))
       (f x2))))
\end{verbatim}

This trick is called \emph{lambda lifting} and it consists of adding
extra arguments to a function in order that it can be defined in the
null lexical environment.

Since lambda lifting alters the signature of the function, it can not
be used when the function can be called from an external context that
assumes the original signature.

Consider the following program fragment:

\begin{verbatim}
(let ((x 0))
   (find-if (lambda (y) (incf x) (> y 10)) some-sequence)
   ...)
\end{verbatim}

This program fragment counts the number of occurrences in
\texttt{some-sequence} that precede the first occurrence greater than
$10$.  Let's say that following the call to \texttt{find-if}, the
variable \texttt{x} is used in many complex ways so that it is
justified to apply SSA conversion to it.  Then we can use the
following trick:

\begin{verbatim}
 (let ((x1 0))
   (flet ((f (z)
             (find-if (lambda (y) (incf z) (> y 10)) some-sequence)
             z))
     (let ((x2 (f x1)))
       ...)))
\end{verbatim}

The trick consists of making a local version of \texttt{x}, named
\texttt{z} and have the closure update that local version instead.
Now, of course, \texttt{z} can not be SSA converted, but \texttt{x}
can as the transformed code fragment shows.  The reason this trick
works is that we know that \texttt{find-if} does not hold on to the
closure after it returns.  We know that it calls the closure a certain
number of times, and then it returns without keeping a reference to
it.%
\fixme{Find out whether there is a commonly used name for such
  functions, and if so use it.  Otherwise, invent one.}

We can see this trick as a transformation followed by a lambda
lifting.  The first transformation consists of introducing the local
function \texttt{f} as follows:

\begin{verbatim}
(let ((x 0))
  (flet ((f ()
            (find-if (lambda (y) (incf x) (> y 10)) some-sequence)))
    (f)
    ...))
\end{verbatim}

Now, lambda-lifting the function \texttt{f} and converting to SSA
gives the final result.

However, there are some situations where we can not apply SSA
transformation to variables:%
\footnote{We should be more specific here: There are some situations
  where we can not see how it could be possible to apply SSA
  transformation.  Future research into new techniques might come up
  with some version of SSA that has the desired properties and that
  will lend itself to these situations.}

Consider the following program fragment:

\begin{verbatim}
(let ((x ...))
   (f (lambda () (incf x)))
   (g (lambda () (incf x))))
\end{verbatim}

In this program fragment, \texttt{f} and \texttt{g} are arbitrary
user-defined functions that may change after the fragment is compiled,
so we can not make any assumptions about them.  In particular, it is
entirely possible that \texttt{f} and \texttt{g} both capture the
closure in global variables like this:

\begin{verbatim}
(defun f (closure)
  (setf *f* closure))

(defun g (closure)
  (setf *g* closure))
\end{verbatim}

In order for this program fragment to respect the semantics of \commonlisp{}
it can be argued that both closures must update a common variable.  It
follows that both closures must assign to this common variable, and
thus that this variable can not be assigned to in a single place as
required by SSA.

Some optimizations are possible even if not every variable can be
converted into SSA form, so that some variables are assigned multiple
times.

In order for a variable $V$ to respect the SSA property, a $\phi$
function for $V$ is required in control flow node $Z$ whenever there
are control flow nodes $X$ and $Y$ containing assignments to $V$, $X
\ne Y$, $X \rightarrow^+ Z$, $Y \rightarrow^+ Z$.  The node $Z$ is
called a \emph{join point} for $V$.  Since adding a $\phi$ function in
$Z$ introduces an assignment to $V$, $Z$ must then recursively be
considered in order to find other join points requiring additional
$\phi$ functions.  The concept of a join point is more generally
defined for an arbitrary set of control flow nodes $S$.  The set of
join points $J(S)$ is defined as the set of all nodes $Z$ such that
there are two non-null paths starting from two distinct nodes $X$ and
$Y$ in S and converge in $Z$.  The \emph{iterated join} $J^+(S)$ is
defined as the limit of the increasing sets of nodes $J_1(S) = J(S)$,
$J_{i+1}(S) = J(S \cup J_i(S))$.

Consider the control flow graph in \refFig{fig-ssa-join-1}.  Empty
boxes contain neither assignments nor references to the variable
\texttt{x}.  There are three nodes containing assignments to
\texttt{x}, namely B, C, and D.  The set $J^+(\{B, C, D\}) = \{E\}$.

\begin{figure}
\begin{center}
\inputfig{fig-ssa-join-1.pdf_t}
\end{center}
\caption{\label{fig-ssa-join-1}
Example of join point.}
\end{figure}

Now, to compute the nodes where a $\phi$ function is required, Cytron
et al use the concept of \emph{iterated dominance frontier}.  First,
the dominance frontier of a single node $x$, written $DF(x)$ is the
set of nodes $y$ such that $x$ dominates an immediate predecessor of
$y$ but $x$ does not strictly dominate $y$.  The dominance frontier of
a \emph{set} of nodes $S$ is the natural extension of $DF$ to a set,
namely the union of the vales for each element.  The iterated
dominance frontier is defined in a way similar to the iterated join.
$DF^+(S)$ is defined as the limit of the
increasing sets of nodes $DF_1(S) = DF(S)$, $DF_{i+1}(S) = DF(S \cup
DF_i(S))$.

Consider again the control flow graph in \refFig{fig-ssa-join-1}.  The
set $DF^+(\{B, C, D\}) = \{E, F\}$.  Note that this set includes the
node $F$ which is not a join point for the nodes that assign to $x$.
This discrepancy is clearly indicated in the the paper by Cytron et
al.  If we analyze it a bit more, we see that clearly no $\phi$
function is required in $F$.  The method using iterated dominance
frontiers thus computes more nodes than required.

However, notice that if \refFig{fig-ssa-join-1} represents a
\commonlisp{} program, then the node $F$ could not contain a reference
to \texttt{x}, simply because no \commonlisp{} construct allows the
creation of an uninitialized lexical variable.  Therefore, in $F$, the
variable \texttt{x} must be \emph{dead}.  The situation in
\refFig{fig-ssa-join-1} can be quite common in \commonlisp{} because
variables often have very limited scope.  It is therefore desirable to
avoid considering the node $F$ as needing a $\phi$ function.

Now consider the control flow graph in \refFig{fig-ssa-join-2}.  The
only difference from \refFig{fig-ssa-join-1} is that node $E$ does not
contain a use of the variable \texttt{x}.

\begin{figure}
\begin{center}
\inputfig{fig-ssa-join-2.pdf_t}
\end{center}
\caption{\label{fig-ssa-join-2}
Example of join point.}
\end{figure}

Clearly, the node $E$ is in the set $J^+(\{B, C, D\})$, even though
the variable \texttt{x} is dead in $E$.  Hence, it is not sufficient
to consider an algorithm based directly on join sets rather than on
iterated dominance frontiers in order to avoid including nodes where
variables are dead.  However, checking variable liveness before
including a node in the calculation of the iterated dominance frontier
takes care of problems generated by both the situation illustrated by
\refFig{fig-ssa-join-1} and by \refFig{fig-ssa-join-2}.  A paper by
Choi et al \cite{Choi:1991:ACS:99583.99594} describes how to modify
the algorithm for placing $\phi$ functions so that only nodes where
the variable is live are affected.  In fact the paper by Choi et al
describes a modification to the algorithm in the POPL paper by Cytron
et all \cite{Cytron:1989:EMC:75277.75280}, and the TOPLAS paper by
Cytron et all \cite{Cytron:1991:ECS:115372.115320} refers to the paper
by Choi et al.  The algorithm used in \sysname{} is the modified
algorithm, except that liveness is determined by a function passed as
an argument.  Passing \texttt{(constantly t)} as an argument will
yield the ordinary SSA form and passing a true liveness test will
yield the \emph{pruned} SSA form.

To compute the immediate dominator of every node in the flow graph, we
use the algorithm by Langauer and Tarjan
\cite{Lengauer:1979:FAF:357062.357071}.  They present two versions of
their algorithm; one that with complexity
$O(m \thinspace\textsf{log}\thinspace n)$ time and one with complexity
$O(m \thinspace\alpha (m, n))$ time, where $m$ is the number of edges
in the flow graph, $n$ is the number of nodes, and $\alpha$ is the
inverse of Ackermann's function.  A more complicated algorithm that
runs in linear time was discovered by Harel
\cite{Harel:1985:LAF:22145.22166}.

\section{Type inference}
\label{sec-optimizations-type-inference}

Type inference is accomplished by formulating it as a \emph{forward
  data flow} problem.  The best result is obtained for variables that
respect the \emph{static single assignment} property as described in
\refSec{mir-optimizations-ssa-form}.

Every \emph{arc} of the flow graph is associated with a \empty{type
  map} for each lexical variable that is \emph{live} at that arc.  The
type map describes all possible types for the variable at that arc.
In many cases, the type map can not be precise.  If so, then it must
contain a \emph{superset} of the possible types.  The exact
representation of the type map is implementation-dependent.

Initially a fictive arc occurring before the initial
\texttt{enter-instruction} of the program is considered to have no
type maps associated with it, since no variables are live there.

The propagation of type maps depends on the type of each instruction.
In particular, the \texttt{typeq-instruction} propagates the
\emph{and} of the existing type map for a variable and the type
specifier of its second input to the first successor arc, and the
\emph{difference} to the second successor arc.

When a \texttt{typeq-instruction} has an outgoing arc containing the
type \texttt{nil} for some variable, this means that control can not
pass through that arc.  We say that such an arc is \emph{dead}.
The dead arc becomes the outgoing arc of a
\texttt{nop-instruction} without a predecessor, and the
\texttt{typeq-instruction} itself is replaced by a
\texttt{nop-instruction}.%
\fixme{This explanation is not great.  Improve it!}
Notice that dead arcs can not simply be removed, because the result
would then be an instruction graph that is not \emph{well formed} as
described in \refChap{chap-ir}.  After this type of transformation,
the instruction graph should be
\emph{reduced}. \seesec{mir-optimizations-reduction}

\section{Escape analysis}

A top-level function can be seen as a tree of nested functions.  Each
function $f$ in the tree executes \texttt{enclose} instructions in
order to create a \emph{closure} out of the code of the functions that
are immediately contained in $f$.  An \emph{escape analysis}
determines what happens to those closures.

Each \texttt{enclose} instruction creates a fresh lexical variable
that is owned by the function executing the instruction.  The question
is what then happens to the contents of that variable.  By
construction, the variable is never assigned to.  Some data-flow
analysis is needed to determine what can happen to that value.  For
now, let us say that there are only two possibilities:

\begin{enumerate}
\item It is used only as the function argument in a \texttt{call}
  instruction.
\item It is used in some other way as well, such as being passed as
  the argument to another function, directly or indirectly.
\end{enumerate}

By traversing the function tree from the top-level to the leaves, and
doing the data-flow analysis at each level, we obtain a \emph{prefix}
of the tree, containing functions that are only called.  Let us start
by examining what can be done with these functions.

%%  LocalWords:  optimizations inlined cdr fixnum destructuring
